A Statistical Model for Automatic Extraction of Korean Transliterated Foreign Words

نویسندگان

  • Jong-Hoon Oh
  • Key-Sun Choi
چکیده

In this paper, we will describe a Korean transliterated foreign word extraction algorithm. In the proposed method, we reformulate the foreign word extraction problem as a syllable-tagging problem such that each syllable is tagged with a foreign syllable tag or a pure Korean syllable tag. Syllable sequences of Korean strings are modelled by Hidden Markov Model whose state represents a character with binary marking to indicate whether the syllable is part of a transliterated foreign word or not. The proposed method extracts a transliterated foreign word with high recall rate and precision rate. Moreover, our method shows good performance even with small-sized training corpora.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transliteration Using a Network of Phoneme Chunks

In this paper, we present methods of transliteration and back-transliteration. In Korean technical documents and web documents, many English words and Japanese words are transliterated into Korean words. These transliterated words are usually technical terms and proper nouns, so it is hard to find them in a dictionary. Therefore an automatic transliteration system is needed. Previous transliter...

متن کامل

Transliterated Pairs Acquisition in Medical Hebrew

The phonetic transcription of a word from a source language using a different script is called transliteration. Transliterations affect Information Extraction (IE) in two ways. First, it takes time for a transliterated word to make it into a technical lexicon, making recognition difficult. A second problem is the variability of ways a foreign word can be rendered phonetically, leading in most c...

متن کامل

Identification of Transliterated Foreign Words in Hebrew Script

We present a loosely-supervised method for context-free identification of transliterated foreign names and borrowed words in Hebrew text. The method is purely statistical and does not require the use of any lexicons or linguistic analysis tool for the source languages (Hebrew, in our case). It also does not require any manually annotated data for training – we learn from noisy data acquired by ...

متن کامل

Japanese Term Extraction Using Dictionary Hierarchy and Machine Translation System

There have been many studies of automatic term recognition (ATR) and they have achieved good results. However, they focus on a mono-lingual term extraction method. Therefore, it is difficult to extract terms from documents in foreign languages. This paper describes an automatic term extraction method from documents in foreign languages using a machine translation system. In our method, we trans...

متن کامل

An English to Korean Transliteration Model of Extended Markov Window

Automatic transliteration problem is to transcribe foreign words in one’s own alphabet. Machine generated transliteration can be useful in various applications such as indexing in an information retrieval system and pronunciation synthesis in a text-to-speech system. In this paper we present a model for statistical Englishto-Korean transliteration that generates transliteration candidates with ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Comput. Proc. Oriental Lang.

دوره 16  شماره 

صفحات  -

تاریخ انتشار 2003